Data-Transfer-Aware Memory Allocation for Dynamically Reconfigurable Accelerators in Heterogeneous Multicore Processors
نویسندگان
چکیده
Accelerator cores in low-power heterogeneous multicore processors have multiple memory modules to enable parallel data access. Recent low-power processors contain address generation units (AGUs) for fast address generation. To reduce the core-area, small functional units such as adders and counters are used in AGUs. Such small functional units make it difficult to implement complex addressing patterns without duplicating the data among multiple memory modules. The data-duplication wastes the memory capacity and increases the data transfer time significantly. This paper proposes a method to remove the memory duplication and to increase the degree of parallelism. To verify the effectiveness of this method, we use window-based media processing which is widely used in many applications. According to the evaluation, the proposed method reduces the total processing time by 14% to more than 85% compared to the previous works.
منابع مشابه
Data-Transfer-Aware Design of an FPGA-Based Heterogeneous Multicore Platform with Custom Accelerators
For an FPGA-based heterogeneous multicore platform, we present the design methodology to reduce the total processing time considering data-transfer. The reconfigurability of recent FPGAs with hard CPU cores allows us to realize a single-chip heterogeneous processor optimized for a given application. The major problem in designing such heterogeneous processors is data-transfer between CPU cores ...
متن کاملA Balanced Programming Model for Emerging Heterogeneous Multicore Systems
Computer systems are moving towards a heterogeneous architecture with a combination of one or more CPUs and one or more accelerator processors. Such heterogeneous systems pose a new challenge to the parallel programming community. Languages such as OpenCL and CUDA provide a program environment for such systems. However, they focus on data parallel programming where the majority of computation i...
متن کاملMemory-Access-Driven Context Partitioning for Window-Based Image Processing on Heterogeneous Multicore Processors
Accelerator cores in low-power heterogeneous processors have on-chip local memories to enable parallel data access. The memory capacities of the local memories are very small. Therefore, the data should be transferred from the global memory to the local memories many times. These data transfers greatly increase the total processing time. Memory allocation technique to increase the data sharing ...
متن کاملAcceleration of Block Matching on a Low-Power Heterogeneous Multi-Core Processor Based on DTU Data-Transfer with Data Re-Allocation
The large data-transfer time among different cores is a big problem in heterogeneous multi-core processors. This paper presents a method to accelerate the data transfers exploiting data-transfer-units together with complex memory allocation. We used block matching, which is very common in image processing, to evaluate our technique. The proposed method reduces the data-transfer time by more tha...
متن کاملHierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement
Modern computer systems feature multiple homogeneous or heterogeneous computing units with deep memory hierarchies, and expect a high degree of thread-level parallelism from the software. Exploitation of data locality is critical to achieving scalable parallelism, but adds a significant dimension of complexity to performance optimization of parallel programs. This is especially true for program...
متن کامل